98 research outputs found

    Lectometry and latent variables

    Get PDF
    Ever since its first formulation in Geeraerts, Grondelaers & Speelman (1999), lectometry has been widely used to map distances between language varieties or ‘lects’. Often, these distances are given a geometrical representation in a low-dimensional space. Examples are the use of Multidimensional Scaling in Speelman, Grondelaers & Geeraerts (2003) and Ruette, Geeraerts, Peirsman & Speelman (2014) and of Correspondence Analysis in Plevoets (2008), Delaere, De Sutter & Plevoets (2012), Prieels, Delaere, Plevoets & De Sutter (2015) and Ghyselen (2016). Usually, the number of dimensions of the geometrical space is chosen on the basis of representativeness, leading to an approximate picture of the linguistic variation. However, the spatial dimensions can also be interpreted as underlying factors governing the variability of the data. This methodological paper will explore this functional interpretation of the geometrical dimensions by establishing the link between lectometry and Latent Variable Models. It will be shown that the dimensions of the lectal space can be considered as hidden variables which lay bare specific causal mechanisms. In particular, analyses of translation and interpreting data will demonstrate that the lectometrical dimensions can be made to correspond to various socio-cultural determinants. That opens up the possibility for lectometrical studies of determining the ‘social meaning’ of linguistic varieties and variants

    Lexis or parsing? A corpus-based study of syntactic complexity and its effect on disfluencies in interpreting

    Get PDF
    Cognitive load is probably one of the most cited topics in research on simultaneous interpreting, but it is still poorly understood due to the lack of proper empirical tests. It is a central concept in Gile’s (2009) Efforts Model as well as Seeber’s (2011) Cognitive Load Model. Both models invariably conceptualize interpreting as a dynamic equilibrium between the cognitive resources/capacities and cognitive demands that are involved in listening and comprehension, production and memory storage. In cases when the momentary demands exceed the interpreter’s available capacities, there is an information overload which typically results in a disfluent or erroneous interpretation. While Gile (2008) denies his Efforts Model is a theory that can be tested, Seeber & Kerzel (2012) put Seeber’s Cognitive Load Model to the test using pupillometry in an experimental interpretation task. In a series of recent corpus-based studies Plevoets & Defrancq (2016, 2018) and Defrancq & Plevoets (2018) used filled pauses to investigate cognitive load in simultaneous interpreters, based on the widely shared assumption in the psycholinguistic literature that silent and filled pauses are ‘windows’ on cognitive load in monolingual speech (Arnold et al. 2000; Bortfeld et al. 2001; Clark & Fox Tree 2002; Levelt 1983; Watanabe et al. 2008). The studies found empirical support for increased cognitive load in simultaneous interpreting in the form of higher frequencies of filled pauses. However, the studies also showed that filled pauses in interpreting are caused mainly by problems with lexical retrieval. Plevoets & Defrancq (2016) observed that interpreters produce more instances of the filled pause uh(m) when the lexical density of their own output is higher. Plevoets & Defrancq (2018) demonstrated that the frequency of uh(m) in interpreting increases when the lexical density of the source text is also higher but it decreases when there are more formulaic sequences. This effect of formulaicity was found in both the source texts and the target texts. Other known obstacles in interpreting, such as the presence of numbers and rate of delivery do not significantly affect the frequency of filled pauses (although source speech delivery rate reached significance in one of the analyses). These results point to the problematic retrieval or access of lexical items as the primary source of cognitive load for interpreters. Finally, in a study of filled pauses occurring between the members of morphological compounds, Defrancq & Plevoets (2018) showed that interpreters produced more uh(m)’s than non-interpreters when the average frequency of the compounds was high as well as when the average frequency of the component members was high. This also demonstrates that lexical retrieval, which is assumed to be easier for more frequent items, is hampered in interpreting. This study critically examines the results of the previous studies by analyzing the effect of another non-lexical parameter on the production of filled pauses in interpreting, viz. syntactic complexity. Subordinating constructions are a well-known predictor of processing cost (cognitive load) in both L1 research (Gordon, Luper & Peterson 1986; Gordon & Luper 1989) and L2 research (Norris & Ortega 2009; Osborne 2011). In interpreting, however, Dillinger (1994) and Setton (1999: 270) did not find strong effects of the syntactic embedding of the source texts on the interpreters’ performance. As a consequence, this paper will take a closer look on syntactic complexity and it will do so by incorporating the number of hypotactic clauses into the analysis. The study is corpus-based and makes use of both a corpus of interpreted language and a corpus of non-mediated speech. The corpus of interpreted language is the EPICG corpus, which is compiled at Ghent University between 2010 and 2013. It consists of French, Spanish and Dutch interpreted speeches in the European Parliament from 2006 until 2008, which are transcribed according to the VALIBEL guidelines (Bachy et al. 2007). For the purposes of this study a sub-corpus of French source speeches and their Dutch interpretations is used, amounting to a total of 140 000 words. This sub-corpus is annotated for lemmas, parts-of-speech and chunks (Van de Kauter et al. 2013), and it is sentence-aligned with WinAlign (SDL Trados WinAlign 2014). The corpus of non-mediated speech is the sub-corpus of political debates of the Spoken Dutch Corpus (Oostdijk 2000). The corpus was compiled between 1998 and 2003, and it is annotated for lemmas and parts-of-speech. The political sub-corpus contains 220 000 words of Netherlandic Dutch and 140 000 words of Belgian Dutch. The data are analysed with a Generalized Additive Mixed-effects Model (Wood 2017) in which the frequency of the disfluency uh(m) is predicted in relation to delivery rate, lexical density, percentage of numbers, formulaicity and syntactic complexity. Delivery rate is measured as the number of words per minute, lexical density as the number of content words per utterance length, percentage of numbers as the numbers of numbers per utterance length and formulaicity as the number of n-grams per utterance length. The new predictor, syntactic complexity, is measured as the number of subordinate clauses per utterance length. Because all five predictors are numeric variables, their effects are modelled with smoothing splines which automatically detect potential nonlinear patterns in the data. The observations are at utterance-level and are nested within the speeches, so the possible between-speech variation is accounted for with a random factor. The preliminary results confirm the hypothesis: while lexical density and formulaicity show similar (positive, resp. negative) effects to what is reported in previous research, the syntactic complexity of the source text is ‘border-significant’ and the syntactic complexity of the target is non-significant. There are some sporadic differences among certain types of subordinate clauses, but the general conclusion is indeed that syntactic complexity is not such a strong trigger of cognitive load in interpreting in comparison to lexically-related factors. That calls for a model of interpreting in which depth of processing plays only a marginal role

    The effect of informational load on disfluencies in interpreting: a corpus-based regression analysis

    Get PDF
    This article attempts to measure the cognitive or informational load in interpreting by modelling the occurrence rate of the speech disfluency uh(m). In a corpus of 107 interpreted and 240 non-interpreted texts, informational load is operationalized in terms of four measures: delivery rate, lexical density, percentage of numerals, and average sentence length. The occurrence rate of the indicated speech disfluency was modelled using a rate model. Interpreted texts are analyzed based on the interpreter's output and compared with the input of non-interpreted texts, and measure the effect of source text features. The results demonstrate that interpreters produce significantly more uh(m) s than non-interpreters and that this difference is mainly due to the effect of lexical density on the output side. The main source predictor of uh(m) s in the target text was shown to be the delivery rate of the source text. On a more general level of significance, the second analysis also revealed an increasing effect of the numerals in the source texts and a decreasing effect of the numerals in the target texts

    Mapping language varieties

    Get PDF

    The geometry of linguistic variation

    Get PDF
    • …
    corecore